-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced example folder with a simpler workflow #126
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It started off with some minor beauty comments but then I wen full typing-nerd.. sorry xD
Overall very instructive example and I have even learned a thing or two.. Was fully bamboozled by the variable number of outputs of the splitter job xD
example/config/workflow.py
Outdated
|
||
# Count lines, lines is a Variable meaning once it's computed its value can be accessed using .get() | ||
# or by converting it into a string | ||
lines = tools.WordCount(sentences).lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propose to rename lines
to num_lines
to clarify its the number of lines in the file and not the lines itself.
first_half = tools.Head(sentences, middle).out | ||
tk.register_output('first_half', first_half) | ||
|
||
# Tell Sisyphus that this output is needed and should be linked inside the output directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the comment up three lines before the first use of tk.register_output
Co-authored-by: michelwi <[email protected]>
tk.register_output('paragraph.%02i.words' % i, wc.words) | ||
|
||
# All jobs inside the wc_per_paragraph list will be computed after this line | ||
await tk.async_run(wc_per_paragraph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this can be part of the simple example, but what do you do if you have multiple experiments and you do not want to block "later" experiements that are independent of this run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to put everything that can run in parallel into its own coroutine and await all of them at the end like this:
await tk.async_gather(coroutine1, coroutine2, coroutine3, ...)
What's missing here is a simple example of having multiple tasks for one Job and a Tasks with multiple parallel threads. Any suggestions? 🤔 |
I think the current workflow is very confusing and a bad example to get started with Sisyphus. Here is a much simpler workflow, it doesn't cover as many edge cases, but should be much easier to understand.